Using Vector Embeddings For Sentiment Analysis

Rod Acosta, Kevin Furbish, Ibrahim Khan, Anthony Washington

Methods

What is a neural Network?

  • A neural network is a type of algorithm that mimics the structure and function of the human brain. Their goal is to create an artificial system that can process and analyze data in a similar way.
  • There are different types of neural networks but there are some common elements between most of them. Those elements are:
    • Artificial Neurons
    • Layers

Neural Network Layers

  • Neural networks usually have three types of layers:
    • Input Layer
    • Hidden layers
    • Output layer

What are embeddings?

  • Embeddings are a technique that allow us to map words or phrases into a corresponding vector of real numbers, where the position and direction of the vector capture the word’s semantic meaning in relation to other words.
  • They make high-dimensional data like words readable to our algorithm/model and allows our model to recognize and learn meaningful relationships and similarities between words

Dense Layer & Cosine Similarity

  • Cosine Similarity
    • Measures the cosine of the angle between two non-zero vectors, providing a measure of similarity.
    • The smaller the angle the higher the similarity between the two vectors.
    • \(cosine\_similarity(u,v) = \frac{u.v}{||u|| ||v||}\)
  • Dense Layer
    • A logistic regression model with a sigmoid activation function used for binary classification.
    • It outputs the probability that the input belongs to a positive class.
    • \(y=\sigma(W⋅z+b)\)
    • Where:
      • z is the flattened input vector.
      • W is the weight vector.
      • b is the bias term.
      • \(\sigma(x) = \frac{1}{1+e^{-x}}\) is the sigmoid function.

Sentiment Analysis

  • Through the use of a neural network and it’s hidden layers (embedding & dense), and the cosine similarity we are able to take inputs and classify them as being part of a positive or negative class based on what our model has learned from our training dataset.